Collection Selection with Highly Discriminative Keys
نویسندگان
چکیده
The centralized web search paradigm introduces several problems, such as large data traffic requirements for crawling, index freshness problems and problems to index everything. In this study, we look at collection selection using highly discriminative keys and query-driven indexing as part of a distributed web search system. The approach is evaluated on different splits of the TREC WT10g corpus. Experimental results show that the approach outperforms a Dirichlet smoothing language modeling approach for collection selection, if we assume that web servers index their local content.
منابع مشابه
Collection Selection for Distributed Web Search Using Highly Discriminative Keys, Query-driven Indexing and ColRank
To my parents, who have always supported me. Summary Current popular web search engines, such as Google, Live Search and Yahoo!, rely on crawling to build an index of the World Wide Web. Crawling is a continuous process to keep the index fresh and generates an enormous amount of data traffic. By far the largest part of the web remains unindexed, because crawlers are unaware of the existence of ...
متن کاملBuilding a peer-to-peer full-text Web search engine with highly discriminative keys
Web search engines designed on top of peer-to-peer (P2P) overlay networks show promise to enable attractive search scenarios operating at a large scale. However the design of effective indexing techniques for extremely large document collections still raises a number of open technical challenges. Resource sharing, self-organization, and low maintenance costs are favorable properties of P2P over...
متن کاملOn the Dissimilarity Representation and Prototype Selection for Signature-Based Bio-cryptographic Systems
Robust bio-cryptographic schemes employ encoding methods where a short message is extracted from biometric samples to encode cryptographic keys. This approach implies design limitations: 1) the encoding message should be concise and discriminative, and 2) a dissimilarity threshold must provide a good compromise between false rejection and acceptance rates. In this paper, the dissimilarity repre...
متن کاملSelection of an Optimal Set of Discriminative and Robust Local Features with Application to Traffic Sign Recognition
Today, discriminative local features are widely used in different fields of computer vision. Due to their strengths, discriminative local features were recently applied to the problem of traffic sign recognition (TSR). First of all, we discuss how discriminative local features are applied to TSR and which problems arise in this specific domain. Since TSR has to cope with highly structured and s...
متن کاملDiscriminative Feature Selection via Multiclass Variable Memory Markov Model
We propose a novel feature selection method based on a variable memory Markov (VMM) model. The VMM was originally proposed as a generative model trying to preserve the original source statistics from training data. We extend this technique to simultaneously handle several sources, and further apply a new criterion to prune out nondiscriminative features out of the model. This results in a multi...
متن کامل